translated by 谷歌翻译
translated by 谷歌翻译
Technology has transformed traditional educational systems around the globe; integrating digital learning tools into classrooms offers students better opportunities to learn efficiently and allows the teacher to transfer knowledge more easily. In recent years, there have been many improvements in smart classrooms. For instance, the integration of facial emotion recognition systems (FER) has transformed the classroom into an emotionally aware area using the power of machine intelligence and IoT. This paper provides a consolidated survey of the state-of-the-art in the concept of smart classrooms and presents how the application of FER systems significantly takes this concept to the next level
translated by 谷歌翻译
对象检测神经网络模型需要在高度动态和安全至关重要的环境(例如自动驾驶或机器人技术)中可靠地执行。因此,在意外硬件故障(例如软误差)下验证检测的鲁棒性至关重要,这些故障可能会影响系统感知模块。基于平均精度的标准指标会在对象级别而不是图像级别产生模型漏洞估计。正如我们在本文中所显示的那样,这并不能提供直观或代表性的指标,表明是由基础记忆中的位翻转引起的无声数据损坏的安全性影响,而是导致典型断层诱导危害的过度估计或低估。为了关注与安全相关的实时应用程序,我们提出了一个新的度量IVMOD(图像漏洞测量的对象检测),以基于错误的图像检测(FPS)或假阴性为基于图像的对象检测,以量化漏洞(FNS)对象,结合严重性分析。对几个代表性对象检测模型的评估表明,即使是单个位翻转也可能导致严重的无声数据腐败事件,具有潜在的关键安全性,例如,(大于)生成的100 fps或最多可产生。 90%的真实阳性(TPS)在图像中丢失。此外,在单个卡住的情况下,可能会影响整个图像序列,从而导致暂时持续的幽灵检测,这些检测可能被误认为是实际对象(覆盖了大约83%的图像)。此外,场景中的实际物体被持续遗漏(最多约有64%的TPS)。我们的工作建立了对此类关键工作负载与硬件故障的安全相关脆弱性的详细理解。
translated by 谷歌翻译
头部和颈部鳞状细胞癌(HNSCC)的病因涉及多种致癌物,例如酒精,烟草和人乳头瘤病毒(HPV)。由于HPV感染会影响HNSCC患者的预后,治疗和存活,因此确定这些肿瘤的HPV状态很重要。在本文中,我们提出了一个新颖的三胞胎级损耗函数和HPV状态预测的多个实例学习管道。这仅使用两个HNSCC同类群体上的常规H&E染色WSI,在HPV检测中实现了新的最新性能。此外,还进行了全面的肿瘤微环境分析,从基因组,免​​疫学和细胞角度来看,HPV +/- HNSCC之间的独特模式。鉴定了与巨噬细胞和结缔细胞(例如成纤维细胞)(例如,成纤维细胞)(例如,成纤维细胞)与T细胞不同亚型(例如T细胞,CD8+ T细胞)的正类型的正相关性,这与临床发现一致。还针对HPV感染状态鉴定了独特的基因表达谱,并且与现有发现一致。
translated by 谷歌翻译
计算病理(CPATH)是一种具有关于组织病理研究的新兴领域,通过计算和分析组织载玻片的数字化高分辨率图像的处理算法。CPATH最近的深度学习的发展已经成功地利用了组织学图像中的原始像素数据的纯粹体积,以预测诊断域,预测,治疗敏感性和患者分层中的目标参数 - 覆盖新数据驱动的AI时代的承诺既组织病理学和肿瘤。使用作为燃料和作为发动机的燃料和AI的数据,CPATH算法准备好用于起飞和最终发射到临床和药物轨道中。在本文中,我们讨论了CPATH限制和相关挑战,使读者能够区分HIPE的希望,并为未来的研究提供指示,以克服这个崭露头角领域的一些主要挑战,以使其发射到两个轨道上。
translated by 谷歌翻译
translated by 谷歌翻译
Deep-learning of artificial neural networks (ANNs) is creating highly functional tools that are, unfortunately, as hard to interpret as their natural counterparts. While it is possible to identify functional modules in natural brains using technologies such as fMRI, we do not have at our disposal similarly robust methods for artificial neural networks. Ideally, understanding which parts of an artificial neural network perform what function might help us to address a number of vexing problems in ANN research, such as catastrophic forgetting and overfitting. Furthermore, revealing a network's modularity could improve our trust in them by making these black boxes more transparent. Here we introduce a new information-theoretic concept that proves useful in understanding and analyzing a network's functional modularity: the relay information $I_R$. The relay information measures how much information groups of neurons that participate in a particular function (modules) relay from inputs to outputs. Combined with a greedy search algorithm, relay information can be used to {\em identify} computational modules in neural networks. We also show that the functionality of modules correlates with the amount of relay information they carry.
translated by 谷歌翻译
In recent years multi-label, multi-class video action recognition has gained significant popularity. While reasoning over temporally connected atomic actions is mundane for intelligent species, standard artificial neural networks (ANN) still struggle to classify them. In the real world, atomic actions often temporally connect to form more complex composite actions. The challenge lies in recognising composite action of varying durations while other distinct composite or atomic actions occur in the background. Drawing upon the success of relational networks, we propose methods that learn to reason over the semantic concept of objects and actions. We empirically show how ANNs benefit from pretraining, relational inductive biases and unordered set-based latent representations. In this paper we propose deep set conditioned I3D (SCI3D), a two stream relational network that employs latent representation of state and visual representation for reasoning over events and actions. They learn to reason about temporally connected actions in order to identify all of them in the video. The proposed method achieves an improvement of around 1.49% mAP in atomic action recognition and 17.57% mAP in composite action recognition, over a I3D-NL baseline, on the CATER dataset.
translated by 谷歌翻译
Direct speech-to-speech translation (S2ST), in which all components can be optimized jointly, is advantageous over cascaded approaches to achieve fast inference with a simplified pipeline. We present a novel two-pass direct S2ST architecture, {\textit UnitY}, which first generates textual representations and predicts discrete acoustic units subsequently. We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization. To leverage large amounts of unlabeled text data, we pre-train the first-pass text decoder based on the self-supervised denoising auto-encoding task. Experimental evaluations on benchmark datasets at various data scales demonstrate that UnitY outperforms a single-pass speech-to-unit translation model by 2.5-4.2 ASR-BLEU with 2.83x decoding speed-up. We show that the proposed methods boost the performance even when predicting spectrogram in the second pass. However, predicting discrete units achieves 2.51x decoding speed-up compared to that case.
translated by 谷歌翻译